NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Leviathan: A Unified System for General-Purpose Near-Data Computing

https://doi.org/10.1109/MICRO61859.2024.00095

Schwedock, Brian C; Beckmann, Nathan (November 2024, IEEE)

The rising cost of data movement poses a significant challenge to future computing systems. The call to arms for novel data-centric systems has spawned a wave of near-data computing (NDC) architectures that move compute closer to data. Despite large benefits promised by NDC, prior designs suffer from limited applicability and difficult programming. This paper identifies the commonalities and differences across NDC designs to develop Leviathan, a unified architecture and programming interface for near-cache NDC. We build a taxonomy of NDC and identify the key dimensions as what, where, and when to compute. Leviathan provides a simple reactive-programming interface and automatically executes actions near data at the right time and place. The ability to integrate multiple NDC paradigms makes Leviathan the only general-purpose system to support a variety of specialized NDC designs. Across a range of NDC-specialized applications, Leviathan improves performance by 1.5×±3.7× and reduces energy by 22%±77% vs. a baseline multicore, while adding only ≈6% area compared to the last-level cache.
more » « less
Full Text Available
Kobold: Simplified Cache Coherence for Cache-Attached Accelerators

https://doi.org/10.1109/LCA.2023.3269399

Brana, Jennifer; Schwedock, Brian C.; Manerkar, Yatin A.; Beckmann, Nathan (January 2023, IEEE Computer Architecture Letters)

Full Text Available
täkō: a polymorphic cache hierarchy for general-purpose optimization of data movement

https://doi.org/10.1145/3470496.3527379

Schwedock, Brian C.; Yoovidhya, Piratach; Seibert, Jennifer; Beckmann, Nathan (June 2022, International Symposium on Computer Architecture)

Current systems hide data movement from software behind the load-store interface. Software’s inability to observe and respond to data movement is the root cause of many inefficiencies, including the growing fraction of execution time and energy devoted to data movement itself. Recent specialized memory-hierarchy designs prove that large data-movement savings are possible. However, these designs require custom hardware, raising a large barrier to their practical adoption. This paper argues that the hardware-software interface is the problem, and custom hardware is often unnecessary with an expanded interface. The täkō architecture lets software observe data movement and interpose when desired. Specifically, caches in täkō can trigger software callbacks in response to misses, evictions, and writebacks. Callbacks run on reconfigurable dataflow engines placed near caches. Five case studies show that this interface covers a wide range of data-movement features and optimizations. Microarchitecturally, täkō is similar to recent near-data computing designs, adding ≈5% area to a baseline multicore. täkō improves performance by 1.4×–4.2×, similar to prior custom hardware designs, and comes within 1.8% of an idealized implementation.
more » « less
Full Text Available
Jumanji: The Case for Dynamic NUCA in the Datacenter

https://doi.org/10.1109/MICRO50266.2020.00061

Schwedock, Brian C.; Beckmann, Nathan (October 2020, MICRO)
null (Ed.)
The datacenter introduces new challenges for computer systems around tail latency and security. This paper argues that dynamic NUCA techniques are a better solution to these challenges than prior cache designs. We show that dynamic NUCA designs can meet tail-latency deadlines with much less cache space than prior work, and that they also provide a natural defense against cache attacks. Unfortunately, prior dynamic NUCAs have missed these opportunities because they focus exclusively on reducing data movement.We present Jumanji, a dynamic NUCA technique designed for tail latency and security. We show that prior last-level cache designs are vulnerable to new attacks and offer imperfect performance isolation. Jumanji solves these problems while significantly improving performance of co-running batch applications. Moreover, Jumanji only requires lightweight hardware and a few simple changes to system software, similar to prior D-NUCAs. At 20 cores, Jumanji improves batch weighted speedup by 14% on average, vs. just 2% for a non-NUCA design with weaker security, and is within 2% of an idealized design.
more » « less
Full Text Available

Search for: All records